Goto

Collaborating Authors

 regressor 0


Proof of AutoML: SDN based Secure Energy Trading with Blockchain in Disaster Case

Toprak, Salih, Erel-Ozcevik, Muge

arXiv.org Artificial Intelligence

In disaster scenarios where conventional energy infrastructure is compromised, secure and traceable energy trading between solar-powered households and mobile charging units becomes a necessity. To ensure the integrity of such transactions over a blockchain network, robust and unpredictable nonce generation is vital. This study proposes an SDN-enabled architecture where machine learning regressors are leveraged not for their accuracy, but for their potential to generate randomized values suitable as nonce candidates. Therefore, it is newly called Proof of AutoML. Here, SDN allows flexible control over data flows and energy routing policies even in fragmented or degraded networks, ensuring adaptive response during emergencies. Using a 9000-sample dataset, we evaluate five AutoML-selected regression models - Gradient Boosting, LightGBM, Random Forest, Extra Trees, and K-Nearest Neighbors - not by their prediction accuracy, but by their ability to produce diverse and non-deterministic outputs across shuffled data inputs. Randomness analysis reveals that Random Forest and Extra Trees regressors exhibit complete dependency on randomness, whereas Gradient Boosting, K-Nearest Neighbors and LightGBM show strong but slightly lower randomness scores (97.6%, 98.8% and 99.9%, respectively). These findings highlight that certain machine learning models, particularly tree-based ensembles, may serve as effective and lightweight nonce generators within blockchain-secured, SDN-based energy trading infrastructures resilient to disaster conditions.


Enhancing Essay Cohesion Assessment: A Novel Item Response Theory Approach

Rosa, Bruno Alexandre, Oliveira, Hilário, Rodrigues, Luiz, Oliveira, Eduardo Araujo, Mello, Rafael Ferreira

arXiv.org Artificial Intelligence

Essays are considered a valuable mechanism for evaluating learning outcomes in writing. Textual cohesion is an essential characteristic of a text, as it facilitates the establishment of meaning between its parts. Automatically scoring cohesion in essays presents a challenge in the field of educational artificial intelligence. The machine learning algorithms used to evaluate texts generally do not consider the individual characteristics of the instances that comprise the analysed corpus. In this meaning, item response theory can be adapted to the context of machine learning, characterising the ability, difficulty and discrimination of the models used. This work proposes and analyses the performance of a cohesion score prediction approach based on item response theory to adjust the scores generated by machine learning models. In this study, the corpus selected for the experiments consisted of the extended Essay-BR, which includes 6,563 essays in the style of the National High School Exam (ENEM), and the Brazilian Portuguese Narrative Essays, comprising 1,235 essays written by 5th to 9th grade students from public schools. We extracted 325 linguistic features and treated the problem as a machine learning regression task. The experimental results indicate that the proposed approach outperforms conventional machine learning models and ensemble methods in several evaluation metrics. This research explores a potential approach for improving the automatic evaluation of cohesion in educational essays.


Using ensemble methods of machine learning to predict real estate prices

Pastukh, Oleh, Khomyshyn, Viktor

arXiv.org Artificial Intelligence

In recent years, machine learning (ML) techniques have become a powerful tool for improving the accuracy of predictions and decision-making. Machine learning technologies have begun to penetrate all areas, including the real estate sector. Correct forecasting of real estate value plays an important role in the buyer-seller chain, because it ensures reasonableness of price expectations based on the offers available in the market and helps to avoid financial risks for both parties of the transaction. Accurate forecasting is also important for real estate investors to make an informed decision on a specific property. This study helps to gain a deeper understanding of how effective and accurate ensemble machine learning methods are in predicting real estate values. The results obtained in the work are quite accurate, as can be seen from the coefficient of determination (R^2), root mean square error (RMSE) and mean absolute error (MAE) calculated for each model. The Gradient Boosting Regressor model provides the highest accuracy, the Extra Trees Regressor, Hist Gradient Boosting Regressor and Random Forest Regressor models give good results. In general, ensemble machine learning techniques can be effectively used to solve real estate valuation. This work forms ideas for future research, which consist in the preliminary processing of the data set by searching and extracting anomalous values, as well as the practical implementation of the obtained results.


Empirical modeling and hybrid machine learning framework for nucleate pool boiling on microchannel structured surfaces

Kuberan, Vijay, Gedupudi, Sateesh

arXiv.org Artificial Intelligence

Micro-structured surfaces influence nucleation characteristics and bubble dynamics besides increasing the heat transfer surface area, thus enabling efficient nucleate boiling heat transfer. Modeling the pool boiling heat transfer characteristics of these surfaces under varied conditions is essential in diverse applications. A new empirical correlation for nucleate boiling on microchannel structured surfaces has been proposed with the data collected from various experiments in previous studies since the existing correlations are limited by their accuracy and narrow operating ranges. This study also examines various Machine Learning (ML) algorithms and Deep Neural Networks (DNN) on the microchannel structured surfaces dataset to predict the nucleate pool boiling Heat Transfer Coefficient (HTC). With the aim to integrate both the ML and domain knowledge, a Physics-Informed Machine Learning Aided Framework (PIMLAF) is proposed. The proposed correlation in this study is employed as the prior physics-based model for PIMLAF, and a DNN is employed to model the residuals of the prior model. This hybrid framework achieved the best performance in comparison to the other ML models and DNNs. This framework is able to generalize well for different datasets because the proposed correlation provides the baseline knowledge of the boiling behavior. Also, SHAP interpretation analysis identifies the critical parameters impacting the model predictions and their effect on HTC prediction. This analysis further makes the model more robust and reliable. Keywords: Pool boiling, Microchannels, Heat transfer coefficient, Correlation analysis, Machine learning, Deep neural network, Physics-informed machine learning aided framework, SHAP analysis


Feasibility of machine learning-based rice yield prediction in India at the district level using climate reanalysis data

De Clercq, Djavan, Mahdi, Adam

arXiv.org Artificial Intelligence

Yield forecasting, the science of predicting agricultural productivity before the crop harvest occurs, helps a wide range of stakeholders make better decisions around agricultural planning. This study aims to investigate whether machine learning-based yield prediction models can capably predict Kharif season rice yields at the district level in India several months before the rice harvest takes place. The methodology involved training 19 machine learning models such as CatBoost, LightGBM, Orthogonal Matching Pursuit, and Extremely Randomized Trees on 20 years of climate, satellite, and rice yield data across 247 of Indian rice-producing districts. In addition to model-building, a dynamic dashboard was built understand how the reliability of rice yield predictions varies across districts. The results of the proof-of-concept machine learning pipeline demonstrated that rice yields can be predicted with a reasonable degree of accuracy, with out-of-sample R2, MAE, and MAPE performance of up to 0.82, 0.29, and 0.16 respectively. These results outperformed test set performance reported in related literature on rice yield modeling in other contexts and countries. In addition, SHAP value analysis was conducted to infer both the importance and directional impact of the climate and remote sensing variables included in the model. Important features driving rice yields included temperature, soil water volume, and leaf area index. In particular, higher temperatures in August correlate with increased rice yields, particularly when the leaf area index in August is also high. Building on the results, a proof-of-concept dashboard was developed to allow users to easily explore which districts may experience a rise or fall in yield relative to the previous year.


Machine learning in the prediction of cardiac epicardial and mediastinal fat volumes

Rodrigues, É. O., Pinheiro, V. H. A., Liatsis, P., Conci, A.

arXiv.org Artificial Intelligence

We propose a methodology to predict the cardiac epicardial and mediastinal fat volumes in computed tomography images using regression algorithms. The obtained results indicate that it is feasible to predict these fats with a high degree of correlation, thus alleviating the requirement for manual or automatic segmentation of both fat volumes. Instead, segmenting just one of them suffices, while the volume of the other may be predicted fairly precisely. The correlation coefficient obtained by the Rotation Forest algorithm using MLP Regressor for predicting the mediastinal fat based on the epicardial fat was 0.9876, with a relative absolute error of 14.4% and a root relative squared error of 15.7%. The best correlation coefficient obtained in the prediction of the epicardial fat based on the mediastinal was 0.9683 with a relative absolute error of 19.6% and a relative squared error of 24.9%. Moreover, we analysed the feasibility of using linear regressors, which provide an intuitive interpretation of the underlying approximations. In this case, the obtained correlation coefficient was 0.9534 for predicting the mediastinal fat based on the epicardial, with a relative absolute error of 31.6% and a root relative squared error of 30.1%. On the prediction of the epicardial fat based on the mediastinal fat, the correlation coefficient was 0.8531, with a relative absolute error of 50.43% and a root relative squared error of 52.06%. In summary, it is possible to speed up general medical analyses and some segmentation and quantification methods that are currently employed in the state-of-the-art by using this prediction approach, which consequently reduces costs and therefore enables preventive treatments that may lead to a reduction of health problems.


Implementing Fair Regression In The Real World

Ruf, Boris, Detyniecki, Marcin

arXiv.org Artificial Intelligence

In a business context where an unconstrained real-world application were The potential risk of machine learning algorithms to unintentionally to be replaced with a fairer one, such extreme discrepancies embed and reproduce bias and therefore discriminating would not be viable because individuals who were substantially various sub populations in high-stakes decisionmaking negatively impacted would probably not accept applications has given rise to the new research the change and switch to a competitor. Based on our findings, field of fair machine learning (Kamiran and Calders 2009; we therefore propose algorithmic post-processing procedures Corbett-Davies et al. 2018; Barocas, Hardt, and Narayanan to adjust for unwanted, extreme discrepancies between 2019). Plenty of quantitative measures of fairness have been unconstrained and fair methods in order to enable a proposed (Dwork et al. 2011; Hardt, Price, and Srebro 2016; smooth transition from an "unfair" to a fairer model. Chouldechova 2017; Berk et al. 2018) which opened up The main contributions of this paper are: the way for three types of algorithms that seek to satisfy them: First, the pre-processing approach which modifies - We empirically examine the evolution of fair regression the data representation prior to using classical algorithms outputs compared to unconstrained predictors and demonstrate (Kamiran and Calders 2012; Zemel et al. 2013). Second, that some variations on the individual level may be the in-processing approach which intervenes during the unacceptable in practice. To the best of our knowledge we learning phase by adding a fairness constraint to the optimization offer the first investigation of this kind; objective (Kamishima et al. 2012; Zafar et al. - We propose a range of post-processing algorithms to mitigate 2017; Zhang, Lemoine, and Mitchell 2018). Third, the postprocessing this effect and therefore provide mechanisms to approach which adjusts the outputs of classical implement fair regression in practice.